Table of Contents

  1. Data
  2. Model
  3. Loss
  4. Training Loop
  5. Hyperparameter Tuning
  6. Training and validation of CycleGAN
  7. Conclusion
  8. References

Project Topic

This is a notebook for I’m Something of a Painter Myself. The task of this project is to convert landscape images into monet style images. It is unpaired image to image translation task which requires CycleGAN training.

target

Target of this project is to achieve FID < 100 in Kaggle Score. In addition, I would like to visually check how my cycle gan works. Therefore, following objectives should be achieved too.

1. Data

There are 300 Monet images and 7038 Landscape images. Size of those images are 256x256x3.

data source

The original data is available in kaggle competition website.

https://www.kaggle.com/competitions/gan-getting-started/data

Functions

convert_img is the function to convert normalized image to original scale. plot_photo visualize 36 images. plot_OneLine plots 6 images in one line.

Reading images

I usually use OpenCV for reading images. Since OpenCV reads images in BGR scale, it shall be transformed into RGB. Thus, cv2.imread function is followed by [:,:,::-1]. Only 100 Landscape images are read at the beginning to reduce RAM usage. They are used for only validation during cycle gan training.

Images are normalized by following code.

Monet images

Landscape images

2. Model

2.1 GAN Components

In cycle gan, there are some common components in generators and discriminators. It is convenient to define those components as functions.

FeatureMapBlock is feature extractor which does not change image size. It is also used at the end of generator to improve quality of generated images. ContractingBlock consists of Conv2D, InstanceNormalization and activation fuction which reduces image size just like typical CNN. ResidualBlock has two layers of Conv2D and InstanceNormalization. Original Input is skipped to the end of the block which minitage dead neuron problem. ExandingBlock consists of Conv2DTranspose which increases image size. It is followed by InstanceNormalization and activation function. For CycleGan model, InstanceNormalization is selected instead of BatchNormalization, because batch size = 1 in CycleGan training.

2.2. Generator

Generator consists of following elements:

This structure is common for the both of generators AB and BA.

2.3 Discriminator

Discriminator consists of following elements:

This structure is common for both discriminator A and B.

3. Loss

3.1 Discriminator Loss

get_disc_loss calculate adversarial loss of a discriminator. It takes prediction (by discriminator) of real and fake images. It calculate loss based on adv_criterion.

3.2 Generator Loss

There are three types of generator losses. First, get_gen_adversarial_loss calculate adv loss of a generator. It takes prediction (by discriminator) of fake images. True label is always 1 in this case. Then loss is calculated based on adv_criterion

get_cycle_consistency_loss compare cycled image with the original one. Typically MAE is selected as cycle_criterion. get_identity_loss has similar criterion. It calculates gap between original image with generated image. For example, if original is Monet, then we need to make sure that Monet image does not change after going through generator_Landscape_Monet (gen_BA).

4. Training Loop

Optimizers

Training cycle gan means training four models: generator AB and BA, discriminator A and B. Threfore, it needs four optimizers.

To monitor cycle gan training progress, losses will be stored in the following lists.

Loss functions

For adversarial loss, BCE or MSE are selected. For other criterion, MAE is the choice.

train_CycleGan trains discriminator and generator separately. To keep balance between generator and discriminator, it does not train disciminator when loss is less than threshold. After training discriminators, it traing generators.

Train_OneEpoch function contains train_CycleGan in for loop. It reads Monet image and Landscape image, then input those images into training. It also has horizontal flip of Monet image as data augmentation, because there are only 300 of Monet images. Everytime of training, it randomly select Monet and Landscape. Thus, combination of two images changes by every training routine.

check_output visualize training progress after each epoch. It shows original Landscape image, converted image, cycled image, and identity image.

5. Hypterparameter Tuning

Following hyperparameters are tuned in previous versions of this notebook. They are also based on Coursera's GAN Specialization [1] and CycleGan paper [2].

Activation Function

I selected LeakyRelu for all activation function in generator and discriminator. GAN Specialization[1] uses Relu for Generator's ContractingBlock and ResidualBlock, but Relu made dark parts of generated images into completely black (blackout), because it cut off small values. In contrast, LeakyRelu preserved information in dark zones, and generated image did not have blackouts.

Number of ConractingBlocks in generator

When number of ContractingBlock is 2, training was stable and fast. Increasing ContractingBlocks did not improve result. It ended up with blurred image with larger training epochs.

Number of ResidualBlocks in generator

Number of ResidualBlock is 3 in final model. Even when there is more than 3 ResidualBlocks, it got the similar result (appearance) of generated image.

kernel size of discrimninator

Kernel size = 5 was more stable than 4 which I have seen in GAN Specialization.

learning rate

Learning rate is set at 0.0002 as per CycleGAN paper[2]. Increasing learning rate did not improve the result.

Adversarial Loss Function

GAN Specialization [1] recommends MSE for adversarial loss to avoid vanishing gradient. However, BCE works slightly better than MSE for my GAN architecture of this project. Hence, BCE is selected for adv loss. Furthermore, BCE is easier to monitor occourence of overfit.

6. Training and validation of Cycle GAN

During training, CycleGAN is validated visually by plotting fake image, cycled image, and identity image from perspective of image structure, color, texture, etc. Furthermore, train loss is plotted to make sure the mode collapse (over fitting) does not occour.

Training Loss

Final Output

36 samples of final output image are shown below. It has different color from the original image.

Write File

Conclusion

This model got FID 74.5 in version 24 of this notebook. It achieved the target of this project FID < 100. The final output has faded color which is different from original images. Texture looks like slightly rougher than original, yet does not look like Monet. However, I could check that CycleGAN worked as it should be:

other takeaways

To implement CycleGAN, I had to study in GAN Specialization[1]. Its problem was that all assignemnts are in Pytorch although I prefer coding in tensorflow. Therefore, I had to re-code everything in tensorflow which was good experience to verify my understanding.

References

[1] Apply Generative Adversarial Networks (GANs). Deeplearning.ai

https://www.coursera.org/learn/apply-generative-adversarial-networks-gans?specialization=generative-adversarial-networks-gans

[2] Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (Zhu, Park, Isola, and Efros, 2020):

https://arxiv.org/abs/1703.10593